5 research outputs found

    Semantics, analysis and security of backtracking regular expression matchers

    Get PDF
    Regular expressions are ubiquitous in computer science. Originally defined by Kleene in 1956, they have become a staple of the computer science undergraduate curriculum. Practical applications of regular expressions are numerous, ranging from compiler construction through smart text editors to network intrusion detection systems. Despite having been vigorously studied and formalized in many ways, recent practical implementations of regular expressions have drawn criticism for their use of a non-standard backtracking algorithm. In this research, we investigate the reasons for this deviation and develop a semantics view of regular expressions that formalizes the backtracking paradigm. In the process we discover a novel static analysis capable of detecting exponential runtime vulnerabilities; an extremely undesired reality of backtracking regular expression matchers

    Regular Expression Matching and Operational Semantics

    Full text link
    Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it were a program with some non-deterministic constructs such as the Kleene star. We formalize this implementation technique for regular expression matching using operational semantics. Specifically, we derive a series of abstract machines, moving from the abstract definition of matching to increasingly realistic machines. First a continuation is added to the operational semantics to describe what remains to be matched after the current expression. Next, we represent the expression as a data structure using pointers, which enables redundant searches to be eliminated via testing for pointer equality. From there, we arrive both at Thompson's lockstep construction and a machine that performs some operations in parallel, suitable for implementation on a large number of cores, such as a GPU. We formalize the parallel machine using process algebra and report some preliminary experiments with an implementation on a graphics processor using CUDA.Comment: In Proceedings SOS 2011, arXiv:1108.279

    Static analysis for regular expression exponential runtime via substructural logics

    No full text
    Abstract. Regular expression matching using backtracking can have exponential runtime, leading to an algorithmic complexity attack known as REDoS in the systems security literature. In this paper, we present a static analysis that detects whether a given regular expression can have exponential runtime for some inputs. The analysis works by forming powers and products of transition relations and thereby reducing the REDoS problem to reachability. The correctness of the analysis is proved using a substructural calculus of search trees, where the branching of the tree causing exponential blowup is characterized as a form of nonlinearity

    Static Analysis for Regular Expression Denial-of-Service Attacks

    No full text
    Abstract. Regular expressions are a concise yet expressive language for expressing patterns. For instance, in networked software, they are used for input validation and intrusion detection. Yet some widely deployed regular expression matchers based on backtracking are themselves vulnerable to denial-of-service attacks, since their runtime can be exponential for certain input strings. This paper presents a static analysis for detecting such vulnerable regular expressions. The running time of the analysis compares favourably with tools based on fuzzing, that is, randomly generating inputs and measuring how long matching them takes. Unlike fuzzers, the analysis pinpoints the source of the vulnerability and generates possible malicious inputs for programmers to use in security machines. Testing the analysis on two large repositories of regular expressions shows that the analysis is able to find significant numbers of vulnerable regular expressions in a matter of seconds.
    corecore